Measuring dialect di erences ∗
نویسنده
چکیده
We measure varietal di erences in general, and di erences with respect to standard languages in particular ( dialectality , in Herrgen/Schmidt's sense) in order to systematize observations about dialect di erences, to make sense of exceptions, and to enable measurements based on randomly selected material, thus obviating issues of potential bias. Finally, measurements allow the characterization of abstract relations among language varieties. We illustrate some issues with simple techniques for categorical data introduced by Séguy and re ned by Goebl, viz., issues concerning frequency, irrelevant variation, and competing forms. We proceed to measuring pronunciation di erences, focusing on di erences in the pronunciation of the same words in di erent varieties. Caution is needed to isolate pronunciation di erences from di erences in in ectional morphology, sandhi, and intonation. We characterize the di erence between sound segments and develop a measure of the di erence between the sequences of those segments in words, including insertions, deletions, and swaps (epenthesis, elision and metathesis). Automating measurement techniques exposes the issue of validation, which lay largely unexamined in earlier dialectology. We propose to validate measurements based on the degree to which they correlate with dialect speakers' judgments of di erence, justi ed by the presumed function of linguistic variation, that of signaling provenance.
منابع مشابه
A Quantitative Analysis of Bulgarian Dialect Pronunciation∗
We apply a computational measure of pronunciation di erence to a database of 36 word pronunciations from 490 sites throughout Stoykov's Bulgarian Dialect Atlases. The result is a comprehensive view of the aggregate pronunciation di erences among the 490 sites. This study aims to contribute therefore to Bulgarian dialectology, as well as to the development and testing of the computational techni...
متن کاملDialect loss and dialect vitality in Flanders
Dialect loss is a relatively new but by now quite general phenomenon in Flanders (i.e., Dutch-speaking Belgium). Although the processes of dialect change and dialect loss have proceeded with great regional di¤erences in speed and intensity in the past decades, there is a general tendency toward replacing primary dialect features of a relatively local scope by secondary dialect features that hav...
متن کاملThe use of shibboleth words for automatically classifying speakers by dialect
Real-world applications using speech recognition must perform well over a range of dialects. Di erences in dialect between the speakers in the training database and the target users often leads to degraded recognition performance. For the BBN Hark Hidden Markov Model (HMM) based system, we have already developed a reasonably e ective technique [1] for dealing with multiple US dialects. The solu...
متن کاملDialects in western Europe: a balanced picture of language death, innovation, and change*
This thematic issue of the International Journal of the Sociology of Language addresses the question of whether dialects in western Europe are dying. Can dialects still be a medium of communication in our industrialized and increasingly urbanized societies? Is there a place for dialects in a globalizing world? And what kind of dialect do we speak right now and shall we be speaking in the near f...
متن کاملEstimation of protein{production levels for ORFs found by E. coli and yeast genome projects, basing on levels of \optimal codon" usage, in connection with feasibility of their protein coding ability and with assignment to foreign{type genes
function; \codon dialect" found for individual unicellular organisms [1]. Taxonomically related organisms have similar dialects but those distantly related have distinct ones. For example, characteristics of E. coli codon{choice (E. coli dialect) di er considerably from those of yeast S. cerevisiae, but are similar to those of Salmonella. By measuring cellular tRNA contents of these three speci...
متن کامل